Skip to content

Automated Test: authz-service-improve-caching-pr #341

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • Refactor

    • Restructured authorization caching to implement denial-specific caching and streamlined identity permission retrieval, reducing database queries on repeated authorization checks.
  • Tests

    • Expanded test coverage for cache behavior across authorization checks and list operations, including cache-hit, cache-miss, and expiration scenarios.

* remove the use of client side cache for in-proc authz client

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>

* add a permission denial cache, fetch perms if not in either of the caches

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>

* Clean up tests

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Cache tests

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Add test to list + cache

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Add outdated cache test

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Re-organize metrics

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

---------

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>
@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

The changes introduce a denial caching mechanism to the RBAC authorization service. A new NoopCache type is added for in-process paths, the Service gains a permDenialCache field, and permission checks now leverage cached identity permissions with explicit denial entries to reduce database calls.

Changes

Cohort / File(s) Summary
Cache Infrastructure
pkg/services/authz/rbac.go, pkg/services/authz/rbac/cache.go
Introduces NoopCache type implementing the cache interface for in-process channels and adds userPermDenialCacheKey function to generate denial-specific cache keys. Refactors RBAC client initialization to use authzlib.NewClient with appropriate cache strategies (NoopCache for in-proc, LocalCache for remote).
Service Implementation
pkg/services/authz/rbac/service.go
Adds permDenialCache field to Service and wires it into initialization. Introduces getCachedIdentityPermissions helper function to centralize identity-based permission caching. Refactors Check and List methods to check denial cache first and fall back to cached identity permissions, reducing database calls on cache hits.
Tests
pkg/services/authz/rbac/service_test.go
Removes legacy cache test case from TestService_getUserPermissions. Adds TestService_CacheCheck and TestService_CacheList to validate caching behavior including cache hits, misses, and explicit denial entries. Updates setupService to initialize permDenialCache.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Service
    participant DenialCache as Denial Cache
    participant IdentCache as Identity Cache
    participant Database

    Client->>Service: Check(permission)
    Service->>DenialCache: Get denial entry
    alt Denial cached
        DenialCache-->>Service: Denied (cached)
        Service-->>Client: Allowed: false
    else No denial cached
        Service->>IdentCache: Get cached permissions
        alt Cached permissions hit
            IdentCache-->>Service: Permissions (cached)
            Service->>Service: Evaluate permission
            alt Permission allowed
                Service-->>Client: Allowed: true
            else Permission denied
                Service->>DenialCache: Set denial entry
                Service-->>Client: Allowed: false
            end
        else Cache miss
            Service->>Database: Fetch identity permissions
            Database-->>Service: Permissions
            Service->>IdentCache: Store permissions
            Service->>Service: Evaluate permission
            alt Permission allowed
                Service-->>Client: Allowed: true
            else Permission denied
                Service->>DenialCache: Set denial entry
                Service-->>Client: Allowed: false
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A denial cache now guards the gate,
With speedy shortcuts, no more wait!
Identity permissions cached so tight,
The service dances left and right,
Permission checks now take their flight! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Automated Test: authz-service-improve-caching-pr' is overly vague and generic. It reads as a test identifier rather than a descriptive summary of the actual changes (adding denial caching, improving identity permission caching in the RBAC authorization service). Use a more descriptive title that captures the main change, such as 'Improve RBAC authorization caching with denial cache and identity permission optimization' or 'Add denial caching and optimize identity permission caching in RBAC service'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch authz-service-improve-caching-pr

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/services/authz/rbac/service.go`:
- Around line 116-137: The permission cache metric is being recorded as a miss
even when getCachedIdentityPermissions(...) returns a cache hit but
checkPermission(...) yields allowed=false; update the logic around
permDenialKey, getCachedIdentityPermissions, and checkPermission so that
metrics.permissionCacheUsage reflects a cache hit in the err==nil branch (e.g.,
call s.metrics.permissionCacheUsage.WithLabelValues("true",
checkReq.Action).Inc() when cachedPerms were found regardless of allowed result,
or add a distinct label like "hit_denied") and only record "false" for real
cache misses outside that branch.
🧹 Nitpick comments (5)
pkg/services/authz/rbac/cache.go (1)

30-32: Cache key collision risk with _ separator.

The denial cache key shares the same prefix format as userPermCacheKey (namespace + ".perm_" + userUID + "_" + action), with name and parent appended using the same _ delimiter. If any field value ever contains _, keys could collide—both across the two key types and within denial keys themselves (e.g., name="a_b", parent="c" vs name="a", parent="b_c").

Current resource names/UIDs likely don't contain _, so the practical risk is low, but consider using a more robust separator (e.g., / or a multi-char delimiter) to make the key scheme collision-proof.

💡 Possible fix
 func userPermDenialCacheKey(namespace, userUID, action, name, parent string) string {
-	return namespace + ".perm_" + userUID + "_" + action + "_" + name + "_" + parent
+	return namespace + ".perm_denial_" + userUID + "/" + action + "/" + name + "/" + parent
 }

Using a distinct prefix (perm_denial_ vs perm_) and a different separator (/) eliminates any ambiguity with userPermCacheKey.

pkg/services/authz/rbac.go (1)

239-251: Receiver name lc is misleading for NoopCache.

The receiver is named lc (likely copied from a LocalCache pattern), but this is a NoopCache. Consider renaming to nc or c for consistency.

💡 Proposed fix
-func (lc *NoopCache) Get(ctx context.Context, key string) ([]byte, error) {
+func (nc *NoopCache) Get(ctx context.Context, key string) ([]byte, error) {
 	return nil, cache.ErrNotFound
 }
 
-func (lc *NoopCache) Set(ctx context.Context, key string, data []byte, exp time.Duration) error {
+func (nc *NoopCache) Set(ctx context.Context, key string, data []byte, exp time.Duration) error {
 	return nil
 }
 
-func (lc *NoopCache) Delete(ctx context.Context, key string) error {
+func (nc *NoopCache) Delete(ctx context.Context, key string) error {
 	return nil
 }
pkg/services/authz/rbac/service.go (2)

342-368: getCachedIdentityPermissions performs a DB call for user identifiers, partially negating the "cache-only" intent.

For TypeUser/TypeServiceAccount, GetUserIdentifiers (line 356) may hit the database if the identity isn't in idCache. This means the "fast cached path" can still trigger a DB round-trip just to construct the perm-cache key, even if the perm cache itself turns out to be empty. The idCache has a longer TTL (2 min) so this is mostly mitigated after the first request, but on a cold cache or after TTL expiry, every Check/List call pays this cost upfront.

If this is intentional, a brief code comment explaining the tradeoff would help future maintainers.


153-155: Denial cache entries may outlive the permission cache, creating a brief stale-denial window.

Both permCache and permDenialCache use shortCacheTTL (30s), but the denial entry is written after the perm entry (which is set inside getUserPermissions). When a permission is granted in the DB shortly after a denial is cached, the denial entry can outlive the perm entry by the delta between the two writes, causing the fast-path at line 117 to return a stale denial even after permCache refreshes.

In practice this delta is small (milliseconds), and the 30s TTL makes any staleness transient. However, it's worth documenting this known tradeoff (or setting the denial TTL slightly shorter than the perm TTL) so operators understand the behavior during permission changes.

pkg/services/authz/rbac/service_test.go (1)

973-995: Test doesn't conclusively prove the denial cache short-circuits.

The permCache entry has "dashboards:uid:dash1": false, which would also result in a denial through the normal permission-check path. If the denial cache were removed, the test would still pass (via the perm check + DB fallback path). To truly verify that the denial cache takes priority, set a permCache value that would allow the request:

-		s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false})
+		s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true})

This way, if the denial cache is bypassed, checkPermission would return allowed=true, and the test would fail—proving the denial cache is the decisive factor.

Comment on lines +116 to +137
permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder)
if _, ok := s.permDenialCache.Get(ctx, permDenialKey); ok {
s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc()
s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return &authzv1.CheckResponse{Allowed: false}, nil
}

cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action)
if err == nil {
allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
if err != nil {
ctxLogger.Error("could not check permission", "error", err)
s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return deny, err
}
if allowed {
s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc()
s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return &authzv1.CheckResponse{Allowed: allowed}, nil
}
}
s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading cache-usage metric when cache hits but check denies.

When getCachedIdentityPermissions returns cached permissions (cache hit) but checkPermission returns allowed=false, execution falls through to line 137 which unconditionally records a cache miss ("false"). The cache was found and consulted; the data simply didn't produce an allow. This inflates the cache-miss rate and deflates the cache-hit rate in monitoring, making it harder to assess actual cache effectiveness.

Consider recording the metric inside the if err == nil branch to reflect that the cache was hit, or use a distinct label (e.g., "hit_denied") to differentiate this path from a true cache miss.

💡 Proposed fix
 	cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action)
 	if err == nil {
 		allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
 		if err != nil {
 			ctxLogger.Error("could not check permission", "error", err)
 			s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
 			return deny, err
 		}
 		if allowed {
 			s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc()
 			s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
 			return &authzv1.CheckResponse{Allowed: allowed}, nil
 		}
+		// Cache was present but didn't grant access — still a cache hit (just not conclusive for allow)
+		s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()
+	} else {
+		s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()
 	}
-	s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()
🤖 Prompt for AI Agents
In `@pkg/services/authz/rbac/service.go` around lines 116 - 137, The permission
cache metric is being recorded as a miss even when
getCachedIdentityPermissions(...) returns a cache hit but checkPermission(...)
yields allowed=false; update the logic around permDenialKey,
getCachedIdentityPermissions, and checkPermission so that
metrics.permissionCacheUsage reflects a cache hit in the err==nil branch (e.g.,
call s.metrics.permissionCacheUsage.WithLabelValues("true",
checkReq.Action).Inc() when cachedPerms were found regardless of allowed result,
or add a distinct label like "hit_denied") and only record "false" for real
cache misses outside that branch.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant